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Abstract 

We put under experimental scrutiny the preferential attachment model that is commonly ac- 
cepted as a generating mechanism of the scale- free complex networks. To this end we chose cita- 
tion network of Physics papers and traced citation history of 40,195 papers published in one year. 
* ^ . Contrary to common belief, we found that citation dynamics of the individual papers follows the 

superlinear preferential attachment, with the exponent a = 1.25 — 1.3. Moreover, we showed that 
the citation process cannot be described as a memoryless Markov chain since there is substantial 
correlation between the present and recent citation rates of a paper. Basing on our findings we 
constructed a stochastic growth model of the citation network, performed numerical simulations 
based on this model and achieved an excellent agreement with the measured citation distributions. 
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The field of growing complex networks (informational, social, biological, etc.) attracted 
increasing interest in the physics community during past decade 1^. Many of these net- 
works are believed to achieve stationary state and to become scale-free . The static 
characteristics of growing networks such as clustering coefficient, community structure, and 
degree distribution were extensively studied both theoretically and empirically [lifl while 
the dynamics of these networks was studied mostly theoretically. It is widely believed that 
they are generated by the preferential attachment [1| (cumulative advantage [5|) mechanism. 
The latter assumes that new links are distributed between existing nodes with probability 
n^i = Aj/ J2i Aj where Aj is the attractivity i.e., the expected number of links acquired by a 
node i in a short time interval At jl|. From the perspective of a single node, the number of 
incoming links grows according to the inhomogeneous Markov process with the rate 



A, = A{k, + koT 



(1) 



where ki is the number of existing links, k^ is the "initial attractivity" a is the attachment 



exponent, t is the age of the node, and A{t) is the aging function 
describes the stochastic multiplicative growth process 

Afcj = \At + adW(t) 



In fact, Eq. [7] 



(2) 



where A/cj is the actual number of newly acquired links during time interval At and (jdW{t) 
is its stochastic component. 

The direct way to verify Eq|7] is to measure Afcj-distributions for the sets of nodes with 



the same degree k, to find A = A/cj, and to check how A depends on k. Previous studies that 



were aimed at this goal 



SHul], focussed on the citations to scientific papers as one of the 



best documented networks and a prototype for the study of dynamic behavior of growing 



networks 



12| . Since the above studies were restricted to relatively small or inhomogeneous 



data sets, they had to apply indirect averaging procedures, such as numerical integration 



These procedures are prone to quantization errors and 



or moving average [10| 
yield inconclusive results. 

Our goal is the direct measurement of the average growth rate of the node degree in a 
complex network (EqJT]) and the assessment of its stochastic part (Eqj2]) as well. Follow- 



ing the accepted practice 



11| we chose a network of citations to scientific papers. We 



performed high-statistics and time-resolved study of the citation dynamics of a very large 




FIG. 1: Statistical distribution of additional citations AA;j accumulated during time window of 
At = 1 year. Continuous lines show fits to negative binomial distribution, k is the number of 
previous citations and t is the number of years after publication. 

set of papers that is field- and age-homogeneous (one scientific discipline, one publication 
year). Basing on our findings we constructed a stochastic model of citation dynamics with 
no "hidden" parameters such as fitness 13| or relevance [l^. Then we performed numerical 
simulation based on our model and verified that the real and simulated citation networks 
have the same microscopic and macroscopic characteristics. 

We used the Thomson-Reuters ISI Web of Science, chose 82 leading Physics journals, 
excluded review articles, comments, editorial, etc., and analyzed citation history of all 40,195 
original research Physics papers published in these journals in one year -1984. For each 
paper i we determined ki^t - the total number of citations accumulated after t years (t = 
Tcit — Tpubi + 1), and Aki^t- the number of citations gained by the same paper in the year t + 1. 
For every citing year t we grouped all papers into ~ 40 logarithmically-spaced bins, each 
bin containing the papers with close k. Figure [1] shows statistical distributions of Aki for 



several such bins and for two selected years. For each bin we found the mean, X{k) = Ak^, 
and the variance, o"^ = {Aki — A)^. 

Figure E] shows that X{k) dependence is well accounted for by Eq|7] where A,ko, and 
a are fitting parameters. We found that the aging function follows the power-law decay, 
A = 3.54/(t + 0.3)^; the initial attractivity is almost time-independent, k^ fa 1.1; (iii) the 
exponent a gradually increases with time from a = 1 to a = 1.25. Although the deviation 
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FIG. 2: Left panel: mean annual citation rate, X{k) = Aki, as a function of the number of previous 
citations k; t is the number of years after publication. The continuous lines show superlinear fit, 
A = A{k + /cq)" where /cq = 1 and a is shown in the inset. Right panel: The same data in the 
linear scale. The intercept of the continuous lines with the horizontal axis yields time-independent 

of a from unity is small, it is significant and contrasts the assumption of linearity commonly 
accepted by the practitioners of the preferential attachment mo del [l|,Q,|5|,y,[3. Indeed, 
while the linear preferential attachment generates the scale-free network with the power-law 
degree distribution,^he superlinear preferential attachment tends to generate the "winner- 
takes-all" network |2|, |6|. 

For comparison, we performed similar measurements for the Mathematics and Economics 
papers published in the same year (1984). We found that the citation dynamics for both 
these disciplines is also well accounted for by Eq|71 The a and turn out to be almost 
the same as those for Physics while the aging function A{t) is different (see Supplementary 
Material). Similar a and were found in the US patent citation studies [loj]. This suggests 
universal microscopic mechanism of citation accumulation whereas the variations in total 
citation counts between scientific fields can be attributed to different initial conditions (the 
number of citations gained during first couple of years after publication) and to different 
growth rates of the number of publications. 

In what follows we analyze another key ingredient of the preferential attachment model - 



the Markov chain assumption. Since Eq|7] postulates that the citation rate A = Afcj depends 
only on the number of previous citations k, it follows that the statistical distribution of 
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FIG. 3: The variance-to-mean ratio (Fano factor), F = cr^/A, for the statistical distributions of 
additional citations Afcj (see Fig(T]). Each point corresponds to the set of papers with the same 
number of expected citations Aj, given by EqlT] (the red squares). The data, especially for k > 60, 
deviate upwards from the F = 1 line, characteristic for the Poisson distribution. The blue circles 
show the variance-to-mean ratio for the A/cj-distributions for the sets of papers with the same 
number of expected citations Aj, EqlHl These data are closer to the F = 1 line. 

additional citations Aki, gained by the papers with the same k during a time window At, 
should be nothing else but the Poissonian: 

^(^*) - w- 

To the best of our knowledge, statistical distribution of additional citations was not 
measured so far. This new kind of measurements (FigH]) reveals that the A/cj-distributions 
are broader than the Poissonian. To quantify this broadening we used the variance-to-mean 
ratio, F = cr^/A, also known as index of dispersion or Fano factor. Figure [3] shows that 
F 1 for small k, as expected for the Poisson distribution, while F » 1 for large k. 
This strong deviation from the Poissonian indicates that EqJT] misses some important factor 
which determines the growth of citation networks. We reasoned that the missing factor is 
related to citation history of papers. To probe this conjecture we considered the temporal 
autocorrelation of the annual citations, Aki{t). Since the typical citation history of a paper 
is too short (10-15 years), the measurement of autocorrelation for a single paper is unreliable. 
Therefore, we measured autocorrelation in the sets of papers that at certain citing year t 
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FIG. 4: The Pearson autocorrelation coefficient for additional citations (Eq|3|). Each point 
corresponds to the set of papers with the same number of previous citations k garnered by a 
certain citing year t. The data for different t almost collapse. The continuous line shows empirical 
approximation (EqjS]). 

have the same number of previous citations k. Specifically, we found the number of citations 
garnered by each paper in such set during the current year and the last year- Afcj ^ and 
Akit-i, correspondingly, and calculated the Pearson autocorrelation coefficient 



ct,t~i = ■ — ■ ■ — - (4) 

Here, at,(Tt-i are the standard deviations of the Aki^t and Aki^t-i distributions, respectively 
[at ~ o't-i), and the averaging is performed over all papers in the set. This was done for all 
k and t. Figure H] shows that Ct,t-i grows with k. For moderately cited papers, k « 60, 
the autocorrelation is weak while for highly cited papers, k » 60, the autocorrelation is 
strong: c ~ 1. The empirical function 

fits well our measurements. Strong temporal autocorrelation of citations violates the under- 
lying assumption of the preferential attachment model l|, l5|: it turns out that citations 
dynamics is not a Markov process since it depends on past history. 



We suggest a more realistic growth model that is based on the the first-order linear 
autoregression, 

Xi = (1 - c)A{h + koT + cAki^t-i (6) 

where A is the latent citation rate and c is given by Eqj5l The actual number of citations 
is given by EqEl Equation |6] introduces positive feedback between successive citations of 
the same paper, in other words, it approximates the citation dynamics of a paper by the 
inhomogeneous self-exciting point process (Similar ideas were discussed in Refs 
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18|.) The resulting preferential attachment model replaces Eql7]by Eq|6]in such a way that 



the stochastic term in Eqj2]reduces to the Poissonian noise. Equation [6] states that the latent 
citation rate of a paper [7] depends not only on the total number of accumulated citations 
but on the recent citation rate as well. This accounts for the "sleeping beauties": the 
papers that initially had small number of citations but suddenly became popular. While the 
conventional preferential attachment model (EqJT]) yields predominance of the "first-movers" 
[lol , our more realistic model allocates a fair share of citations to " sleeping beauties" . 

To verify the multiplicative stochastic model described by Eql6] we chose all Physics 
papers published in the same year (1984), fixed a certain citing year (t =1986), measured 
the number of total and last year citations, ki^t and A/cj t_i, and calculated Aj for each 
paper using Eql6] with experimentally measured parameters c{k), A(t), ko, and a{t). Then 
we run numerical simulations assuming Poisson process with the rate given by Eq|6], found 
the number of citations of each paper in the year t + 1, and calculated the cumulative 
distribution of citations. The procedure was repeated for the next year and so on. Figure 
[5^ shows that this algorithm closely reproduces the actual citation distribution for each 
citing year. This means that Eq|6] yields an excellent description not only of the microscopic 
citation dynamics but of the macroscopic citation distribution as well. On another hand, 
the numerical simulation that assumes only Poisson process and ignores correlations, does 
not reproduce well our measurements (Fig. [5)d). 

What are the implications of our study? We find that the cumulative citation distribu- 
tion is neither stable nor stationary but develops in time. Immediately after publication 
the spread of initial conditions (journal circulation numbers) yields convex cumulative dis- 



tribution of citations that can be fitted equally well by the (discrete) power-law 20l-l22| 



or log-normal 10|, l23|, |2J, |3l| functions. Thereafter, citation dynamics of most papers is 
dominated by the first term in Eqj6] in such a way that the citation history of papers that 
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FIG. 5: Cumulative citation distributions for 40,195 Physics papers published in 1984. The citing 
year is indicated at each curve. Red symbols - measurements, blue symbols- numerical simulation 
assuming initial citation distributions of 1985 and 1986. Left panel: Full model based on Eql3]and 
EqE] provides excellent fits to the measured citation distributions. Right panel: Incomplete model 
based on Eql7)[|correlations ignored) and Eql3] underestimates citation counts. 

managed to garner less than 50-70 citations is completed after 10-15 years. However, the 
papers with more than 50-70 citations continue to be cited even after 10-15 years, their 
dynamics being determined by the second term in Eqj6] which does not decay with time. In 
other words, while the bulk of the citation distribution becomes stable, the tail grows. In the 
course of time its shape changes from the convex to concave in such a way that for the most 
part of the time the tail looks straight in the log-log coordinates. Although such power-law 
tail was previously considered as a fingerprint of the scale-free network, at least for citation 
network it turns out to be a transient phenomenon. The intrinsic scale of the citation net- 
work, kcr = 50 — 70, is clearly revealed in the microscopic dynamics (FigJl]). We conclude 
that the almost power-law degree distribution of citations that was previously interpreted as 

[l, y, y, 20 1 arises from the interplay between aging 



;he indication of the scale-free network 



26|, multiplicative stochastic process (Eq|2]), and superlinear preferential attachment. 

The two-term Eqj6]implies that scientific papers constitute two broad classes with respect 
to their longevity [lO|. The citation rate of the 90% of the papers achieves its maximum 
in 2-3 years after publication and decays to zero in 10-15 years. Citation dynamics of 
these papers is the aftereffect of their initial hit and is more or less predictable since the 
impact of these papers is probably limited to several research groups and does not propagate 
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further. However, citation rate of 10% of the papers that overcome the tipping point 22 1 
of kcr — 50 — 70 citations is determined more by their recent citation history. It seems 
that these papers have a continuing impact 27|]which propagates from one research group 



to another in a cascade process hke in epidemics 



This diffusion of scientific knowledge 



29| extends the paper longevity to much more than 10-15 years. 

In summary, our measurements indicate that the mechanism that generates complex 
networks may be more sophisticated than the memoryless linear preferential attachment 
assumed so far. We propose a stochastic growth model that considers the evolution of the 
node degree as an inhomogeneous self-exciting point process. In the context of citations, the 
model is fully verified by our microscopic and macroscopic measurements and can serve for 
prognostication of the future citation behavior of a paper, group of papers, or of a journal 
impact factor. 

We are grateful to S. Redner, N. Shnerb, and A. Scharnhorst for insightful discussions. 
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Supplementary material: Methodology 

To find time evolution of the number of citations of Physics papers we used the Thomson- 
Reuters ISI Web of Science. We chose a certain pubhshing year (1984) that on the one hand 
is distant enough from now, while on another hand, the contents of the most part of the 
papers published in this year is available now in the electronic format. We considered the 
fields of Physics, Astrophysics, Optics, Crystallography, and Material Science and excluded 
popular science and review journals. We performed the search using 82 journals with the 
highest annual number of publications and considered only articles, letters and notes while 
the editorial material, comments, and reviews were excluded. To find Physics papers in the 
multidisciplinary Nature and Science journals we looked through the titles of all publications 
in 1984 and chose only those that by our opinion fall under Physics category. 

How representative is this list of journals? To answer this question we invoke Bradford's 
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law [30] stating that few largest journals contain the dominant part of papers in the field. 
Figure [6] shows that the rank-size distribution for the set of largest 82 journals exhibits the 
power-law dependence with cut-off. Extrapolation to = 1 yields that these 82 journals 
contain ~ 95% of all Physics papers published in 1984. 




FIG. 6: The rank-size plot of the Physics journals where the rank is the number of papers published 
in a each journal in 1984. The dashed line shows the power-law fit representing the Bradford's law. 



Supplementary material: The list of Physics journals 

Physical Review A,B,C,D; Physical Review Letters; Japanese Journal of Applied Physics 
A,B; Journal of Physics*; Physics Letters A,B; Acta Crystallographica*; Journal of Chem- 
ical Physics; Physica Status Solidi*; Nuclear Instruments*; Astrophysical Journal; Journal 
of Physical Chemistry; Journal of Applied Physics; Journal de Physique*; Chemical Physics 
Letters; Nuclear Physics*; Journal of the Optical Society of America*; Solid State Commu- 
nications; Zeitschrift fur Physik*; Applied Physics Letters; Applied Optics; Surface Science; 
Journal of the Physical Society of Japan; Journal of Vacuum Science*; Journal of Non- 
Crystalline Solids; Journal of Crystal Growth; Comptes Rendu de'l Academic des Sciences 
Serie II *; Thin Solid Films; Journal of Materials Science; American Journal of Physics; 
Physics of Fluids; Nuovo Cimento*; Review of Scientific Instruments; Physica ?; Chemical 
Physics; Optics Communications; Journal of Magnetism and Magnetic Materials; Journal 
of Luminescence; Journal of Fluid Mechanics; Philosophical Magazine*; Physica Scripta; 
Canadian Journal of Physics; Journal of Statistical Physics; Optics Letters; Applied Spec- 
troscopy*; Solid State Ionics; Plasma Physics and Controlled Fusion; Journal of Physics and 
Chemistry of Solids; Journal of Computational Physics; Communications in Mathematical 
Physics; Zeitschrift fur Kristallographie*; Nature, Science. 
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Supplementary material: Homogeneity of Physics 



The homogeneity of our data set with respect to citations is ensured by the fact that 
the papers in different Physics subdisciphnes are pubhshed in the same journals. These 
journals impose a certain standard of the reference list length that provides a natural scale 
for citations in this scientific discipline. Indeed, assume the cohort of all papers that were 
published in the same year and consider citation distribution for this cohort after many years. 
Since all these papers will be cited predominantly by the papers in the same discipline, and 
neglecting the 2% annual growth of the number of publications, the mean number of citations 
would be equal to the average length of the reference list. This condition of stationarity was 
the rationale for our choice of the whole Physics discipline rather than a single subdiscipline. 

This homogeneity of Physics with respect to citations is also revealed from the inspection 
of the cumulative citation distributions for the papers published in different Physics journals. 
Indeed, in the researcher's perception the theoretical and experimental papers could have 
different citation pattern. To find out to what extent this is true we chose three large Physics 
journals that do not have page limitation- Physical Review B (PRB), Journal of Aplied 
Physics (JAP), and Review of Scientific Instruments (RSI). As seen from the reader/author's 
perspective, PRB publishes theoretical and experimental papers in fair proportion, JAP is 
more biased towards experimental papers, and RSI publishes almost exclusively experimental 
papers. Figure [7] shows cumulative citation distribution for the papers published in these 
journals in the decade 1980-1989. Although these distributions look different they almost 
collapse after rescaling k k/m where m is a journal-specific constant 3l|]. This means 
that the growth model for citations of the papers published in these three journals is actually 
the same and only the scale (initial conditions, namely, the number of citations garnered by 
a paper during first 2-3 years after publication) is different. 

The homogeneity of the Physics discipline with respect to citations becomes evident also 
from the consideration of the list of the most cited Physics papers. As an example we chose 
fifteen most cited papers published in the decade of 1980-1989 and cited by 2008. Notably, 
the list below is not dominated by a single subdiscipline. Although the theoretical papers 
prevail, the experimental papers are represented fairly well (26.6%). (The absence of several 
Nobel prize-winning papers that appeared during this period is explained by the fact that 
they quickly became common knowledge). 
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FIG. 7: Cumulative distribution of citations to the papers published in 1980-1989 and cited by 2008. 
We show results for the Physical Review B (PRB), Journal of Applied Physics (JAP) and Review 
of Scientific Instruments (RSI). Although citation distributions for these three large journals look 
different (left panel), the difference is in scale and not in shape. The scaled distributions (right 
panel) are much more alike. 
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Supplementary material: Comparison to other disciplines 

We performed similar studies using Mathematical and Economics papers. Unlike Physics 
study where we tried to cover the whole field in order to achieve good statistics, here we 
adopted a different choice and considered only pure Mathematical and pure Economics 
papers. The resulting data set is more homogeneous but does not have enough statistics 
due to relatively small number of papers. We used the Thomson Reuters ISI Web of Science, 
chose all relevant Math journals (500 titles), considered all original research Math papers 
published in these journals in one year -1984 excluding review articles, comments, editorial, 
etc. This left 6313 Math papers. A similar search in the field of Economics (165 relevant 
journals) yielded 3043 original research papers published in 1984. We analyzed citation 
history of these papers from 1984 to 2011. 
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FIG. 8: Mean annual citation rate, X{k) = Aki, as a function of the number of previous citations 
k for the Mathematical, Economics, and Physics papers - all published in 1984. t is the number 
of years after publication. The continuous lines show superlinear approximations, A = A(k + ko)" 
where ko = 0.7 for Mathematics, ko = 1 for Economics, and ko = 1.1 for Physics. The exponent 
a{t) is time-dependent as shown in the insets. Note different scales in (a),(b), and (c). 

For each paper i we determined ki^t - the total number of citations accumulated after t 
years (t = Tdt — Tpuu + 1), and Aki^t- the number of additional citations gained by the paper 
in the year t + 1. We chose a certain citing year t and grouped all papers into logarithmically- 
spaced bins, each bin containing the papers with close k, and found the mean, X{k) = Aki, 
of each distribution. 

Figure [S] shows that the X{k) plots for all three disciplines can be fitted by the superlinear 
dependence 

X = A{k + koT (7) 

where k is the number of previous citations, /cq is the "initial attractivity" , a is the attach- 
ment exponent, and A{t) is the aging function. 

We observe that the citation dynamics for all three disciplines follows superlinear pref- 
erential attachment mechanism (Eq|7]) with similar parameters a ~ 1.2 — 1.3, A;o ~ 1 and 
the aging function decaying with time as A oc t^^'^ in the first decade after publication. 
(Divergence of A after 10 years is most probably related to the different annual growth rate 
of the number of publications in these disciplines.) The Pearson autocorrelation coefficient 
for the Mathematical and Economics papers (not shown here) can be approximated by the 
same dependence, Ct,t-i = {k + 3)/{k + 60), found for the Physics papers. All this provides 
strong evidence of the generality of our results. 
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FIG. 9: A, the aging function for the Mathematical, Economics and Physics papers. The data for 
the first couple of years are different, the data for subsequent 6-8 years are close and 10 years after 
publication the data diverge once again. 
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